Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files

نویسندگان

Roger Weber

Klemens Böhm

Hans-Jörg Schek

چکیده

Nearest-neighbor search (NN-search) plays a key role for content-based retrieval. As a first contribution, this article shows that NN-search is a meaningful implementation of similarity search, even if features are high-dimensional. But NN-search over high-dimensional features is of linear complexity and query response times have not been satisfactory for large collections of multimedia objects. This paper, based on the VA-File, investigates parallel NN-search in a Network of Workstations (NOW). The article identifies various design alternatives for such a search engine and evaluates them. The alternatives basically relate to data placement and division of work among components. We also use Amdahl’s law to predict the speedup and response times for a given data set and a given setup. Because of the scan-based nature of the VA-File, one might expect an improvement almost linear in the number of components. But the best speedup we have observed is by almost 30 for a NOW with only three components. The effect is due to the elimination of the IO-bottleneck. From another perspective, our solution provides interactive-time similarity search, i.e. a search through 900 MB feature data lasts about one second in a NOW with three components.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشه‌بندی

With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...

متن کامل

VA-Files vs. R*-Trees in Distance Join Queries

In modern database applications the similarity of complex objects is examined by performing distance-based queries (e.g. nearest neighbour search) on data of high dimensionality. Most multidimensional indexing methods have failed to efficiently support these queries in arbitrary high-dimensional datasets (due to the dimensionality curse). Similarity join queries and K closest pairs queries are ...

متن کامل

Faster Exact Histogram Intersection on Large Data Collections Using Inverted VA-Files

Most indexing structures for high-dimensional vectors used in multimedia retrieval today rely on determining the importance of each vector component at indexing time in order to create the index. However for Histogram Intersection and other important distance measures this is not possible because the importance of vector components depends on the query. We present an indexing structure inspired...

متن کامل

Similarity-based visualization of large image collections

Effective techniques for organizing and visualizing large image collections are in growing demand as visual search gets increasingly popular. Targeting an online astronomy archive with thousands of images, we present our solution for image search and clustering based on the evaluation of image similarity using both visual and textual information. Time-consuming image similarity computation is a...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files

نویسندگان

چکیده

منابع مشابه

مرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشه‌بندی

VA-Files vs. R*-Trees in Distance Join Queries

Faster Exact Histogram Intersection on Large Data Collections Using Inverted VA-Files

Similarity-based visualization of large image collections

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

عنوان ژورنال:

اشتراک گذاری